Early trajectory of clinical global impression as a transdiagnostic predictor of psychiatric hospitalisation: a retrospective cohort study

Summary Background Identifying patients most at risk of psychiatric hospitalisation is crucial to improving service provision and patient outcomes. Existing predictors focus on specific clinical scenarios and are not validated with real-world data, limiting their translational potential. This study aimed to determine whether early trajectories of Clinical Global Impression Severity are predictors of 6 month risk of hospitalisation. Methods This retrospective cohort study used data from the NeuroBlu database, an electronic health records network from 25 US mental health-care providers. Patients with an ICD-9 or ICD-10 code of major depressive disorder, bipolar disorder, generalised anxiety disorder, post-traumatic stress disorder, schizophrenia or schizoaffective disorder, ADHD, or personality disorder were included. Using this cohort, we assessed whether clinical severity and instability (operationalised using Clinical Global Impression Severity measurements) during a 2-month period were predictors of psychiatric hospitalisation within the next 6 months. Findings 36 914 patients were included (mean age 29·7 years [SD 17·5]; 21 156 [57·3%] female, 15 748 [42·7%] male; 20 559 [55·7%] White, 4842 [13·1%] Black or African American, 286 [0·8%] Native Hawaiian or other Pacific Islander, 300 [0·8%] Asian, 139 [0·4%] American Indian or Alaska Native, 524 (1·4%) other or mixed race, and 10 264 [27·8%] of unknown race). Clinical severity and instability were independent predictors of risk of hospitalisation (adjusted hazard ratio [HR] 1·09, 95% CI 1·07–1·10 for every SD increase in instability; 1·11, 1·09–1·12 for every SD increase in severity; p<0·0001 for both). These associations were consistent across all diagnoses, age groups, and in both males and females, as well as in several robustness analyses, including when clinical severity and clinical instability were based on the Patient Health Questionnaire-9 rather than Clinical Global Impression Severity measurements. Patients in the top half of the cohort for both clinical severity and instability were at an increased risk of hospitalisation compared with those in the bottom half along both dimensions (HR 1·45, 95% CI 1·39–1·52; p<0·0001). Interpretation Clinical instability and severity are independent predictors of future risk of hospitalisation, across diagnoses, age groups, and in both males and females. These findings could help clinicians make prognoses and screen patients who are most likely to benefit from intensive interventions, as well as help health-care providers plan service provisions by adding additional detail to risk prediction tools that incorporate other risk factors. Funding National Institute for Health and Care Research, National Institute for Health and Care Research Oxford Health Biomedical Research Centre, Medical Research Council, Academy of Medical Sciences, and Holmusk.


Introduction
Any time series (i.e.sequence of consecutive measurements) can be summarized by a set of simple values (sometimes referred to as summary statistics).The mean of all values is one of the simplest summary statistics.If one is interested in how values are spread around the mean, then one might also calculate the standard deviation of the values.Sets of values that are not time series are often summarised simply by their mean and standard deviation which represent the coarse grain aspects of their distribution.However, these two summary statistics are often insufficient to represent even coarse-grain aspects of time series: time series with the same mean and standard deviation can look very different (Fig. S1).S1C does not fluctuate at all.We refer to the degree of fluctuation of a time series as its instability. 1In the next subsection, we formalise how the idea of instability can be translated into a number (i.e. a summary statistic).

Operationalization
One way to start operationalizing the concept of instability introduced above is to notice that time series in Fig. S1 differ by how much the value can change from one time point to the next.In Fig. S1A, the value can go from one extreme to the other in a single step, whereas in Fig. S1B there are always intermediate steps between extreme values.We can therefore quantify the instability of the time series via the difference between two consecutive values.
Denoting by  ! the i-th value of the time series, the difference with the subsequent value is  !"# −  ! .Because we are interested in the magnitude of the difference but not its sign, we can focus on the squared differences: ( !"# −  ! ) $ .Averaging over all pairs of successive values gives us the mean square of successive differences and taking the square root of the result gives us the root mean square of successive differences (RMSSD): 3][4] However, this definition requires one last adaptation.So far, we have assumed that values of the time series were measured at constant intervals (e.g. in Fig. S1, there is one value for every unit of time).
In clinical practice where CGI-S is recorded during different clinical encounters, the intervals between subsequent recordings might vary.This is an important consideration because differences between values measured further apart in time might be larger and this might not reflect faster fluctuations of the time series.An example of this is represented in Fig. S2 which shows both the original time series of Fig. S1B and a version of the same time series in which 30% of values have been removed at random.Because the overall shape of the curve is the same in the two scenarios, their instability should be the same.This can be achieved by adjusting the successive differences for time.If we denote by  ! the time at which the value  ! was recorded, then instead of focussing on the difference between subsequent values, we can focus on the rate at which the value changes with time: Instead of averaging the squared differences, we can average the squared rates of change.
Taking the square root of the result gives us the time-adjusted root mean square of successive differences (tRMSSD) which we used in this study: If a measurement is available for every unit of time, then  !"# −  != 1 and tRMSSD is the same as RMSSD.Of note, the quantity can also be interpreted as the slope of the line that connects two subsequent values.This geometrical interpretation means that if unrecorded values lie on the line that connects measured values, then adding or removing them makes little difference to the tRMSSD.
Using this formula, we can now calculate the instability for the time series in Fig. S1: 1.41 in A, 0.59 in B, and 0.03 in C.

Code
The tRMSSD is easily calculated using R (or any other programming language for numeric calculation).Using as input a vector of measurement times (e.g.timestamps or dates) which we denote times and a vector of values measured at those times which we denote values, the following function in R calculates the tRMSSD: Why might clinical instability predict hospitalization: a mathematical example So far, we have seen what clinical instability is and how it can be operationalised.In this section, we provide an intuition for its association with later hospitalization.We use a simplistic mathematical model to illustrate the main point.
Let us assume that a patient's clinical state can be summarised by a single number with higher values representing higher severity.And let us assume that this value can change from day to day and that over a certain threshold, the patient requires hospitalization.Now let us consider two groups of patients: group A and group B. Patients in group A have clinical states that are simply random numbers drawn every day from a Normal distribution with mean 0 and standard deviation 1.In mathematical terms, the clinical state Ci at time i is: To define the clinical states of patients in group B, we start with a sequence of random numbers  !similar to group A, but this time, the clinical state at time i is a weighted average of the current value  ! and the previous 5 values In other words: where  !~ (0,1).
The coefficient is used so that the standard deviation of the clinical state in group B is 1 (as in group A).So the clinical states in both group A and group B have the same mean (=0) and same standard deviation (=1).The difference between the two is that in group A, two

Description of the network for the PHQ-9 data
The data for the primary analysis and all but one secondary analyses are based on NeuroDB 21R1 dataset which is fully described in a cohort profile paper. 5The secondary analysis based on PHQ-9 data is based on a separate network of U.S. health care organizations which comprises the following: 1) A large behavioural and developmental disability care centre serving over 88,000 people per year.The centre serves a high percent of uninsured and Medicaid patients across their outpatient clinics that provide different mental health and substance use services.Though the centre provides outpatient services only, they keep records of patient hospitalization / transfer details with centre (2) where patients are often transferred to/from for inpatient services.
2) A large provider of inpatient psychiatric care, which serves a high percent of uninsured and Medicaid patients.This centre has some patients overlap with center (1) where patients are often transferred to/from for outpatient psychiatric services.
3) Specialty care clinics and neighbourhood-based health centres which serve a high percent of uninsured and Medicaid patients and handle more than 45,000 outpatient visits and 1,200 in-patient admissions annually.Some of the specialties include trauma care, cancer care, women's health, primary care and behavioural health.
4) A large provider of mental health services serving over 55,000 adults and children annually.Additionally, the centre provides primary care and serves homeless adults, youth, and families who have a mental health and/or a substance abuse challenge.The centre offers care through outpatient facilities that receives patients across 4 different neighbourhood hospitals and has a high percent of uninsured and Medicaid patients.

Diagnostic codes
The following diagnostic codes were used to define diagnoses in the NeuroBlu data.

Translating the effect size into clinically relevant effect
In the discussion, we illustrate how the effect size might translate into lower admission rates.
Here we provide the details of how we calculated these.
We let: § E be the effectiveness of the intervention, i.e. the reduction in probability of hospitalization for a patient who would have otherwise been admitted had they not received the intervention.We set E=80% in our example, to corresponds to the effectiveness of crisis resolution teams.§ B be the baseline admission rate in a population.We set B=50% to correspond to a population of patients in crisis.§ C the capacity of the service, i.e. the proportion of patients who can receive the intervention.We set C=25% in our example.
From a population of 1000 people, if we were to apply the intervention at random, we would select 250 patients randomly (= C*1000).Out of those 250 patients, 125 would have been hospitalized if they had not received the intervention (=B*250).However, because the intervention is effective, only 25 (=125*(1-E)) will be hospitalized.This is why we claimed that in this simple example, 100 admissions would have been prevented by the intervention.Now let's assume that instead of applying the intervention at random, we apply it to the top half of the population in terms of clinical severity and instability (i.e. the 25% of the population at higher risk of hospitalization).In that population, separated in quadrants (and assuming that 25% of the population falls in each quadrant), the risk of hospitalization are as follows (these can be derived from simple algebra to reach a total risk of 50% in the population while reflecting the hazard ratios seen in Fig. 3a in the paper): -Bottom half in terms of both clinical severity and instability: 39.6% -Bottom half in terms of severity, but top half in terms of instability: 51.1% (=39.6%*1.29)-Top half in terms of severity, but bottom half in terms of instability: 51.9% (=39.6%*1.31)-Top half in terms of both severity and instability: 57.4% (=39.6% * 1.45).
Within the latter group, there would be 144 admissions without the intervention (=57.4%*250).However, with the intervention, this number reduces to 29 (=144*(1-E)), hence 115 admissions have been prevented by the intervention (=144-29), or 15 more than if the intervention was applied at random.

Fig. S1 -
Fig. S1 -Example time series with the same mean (= 0) and standard deviation (= 1) but Fig.S1Cdoes not fluctuate at all.We refer to the degree of fluctuation of a time series as its

Fig. S2 -
Fig. S2 -(A) Original time series presented in Fig. S1B.(B) The same time series after consecutive clinical states are independent from one another (i.e.knowing the clinical state at time i does not provide any information about the clinical state at time i+1).By contrast, in group B two consecutive clinical states are highly correlated because they share many of the terms in their definition.The time series illustrated in Fig. S1 are actually examples of clinical states from group A (Fig S1A) and group B (Fig S1B) and we have already seen that instability in Fig. S1A is higher than in Fig. S1B.To assess whether the higher clinical instability in group A predicts higher risk of hospitalization, we can simulate 10,000 patients in each group, each having a time series of 1000 clinical states.Assuming that hospitalization is necessary (in this simplistic model) whenever the clinical state exceeds 3.5, we can then count the number of patients who meet this criterion in group A and group B. The results from this simulation are presented in Fig. S3.Significantly fewer patients in group B meet the 'hospitalization' criterion than in group A (1518 vs. 2064 out of 10,000, Chi 2 =101, df=1, p < 0.0001).

Fig
Fig. S3 -10,000 simulated trajectories in Group A and B with the same mean and standard The results from this simulation might at first seem remarkable.After all, each individual value of the clinical state follows the same distribution (a normal distribution with mean 0 and standard deviation 1) in group A and B. So why is it more likely to observe an extreme value in group A than it is in group B? The reason is that, in group A, every single day gives a new chance to observe an extreme value, whereas in group B values are correlated with the previous ones which decreases the risk of observing extreme values.In group A, an analogy would be the chance of observing at least one 6 when throwing two dices independently whereas in group B the two dices are first glued together so that they always display the same number.The chance of observing at least one 6 in the latter case is substantially lower (1 in 6 vs. 11 in 36).

Table S1 -
Association between data missingness and exposure/outcome

Table S2 -
Summary statistics for the distribution of clinical severity in the whole cohort (i.e.marginal distribution), and those with clinical instability in the bottom and top half of the population in terms of clinical instability.The distributions are very similar indicating that a wide range of clinical severity can be observed for different levels of clinical instability.

Table S3 -
Number of individuals and number of events in the analyses where clinical severity and instability are dichotomised.

Table S4 -
Baseline characteristics of the extended sample wherein the measurement period was set to 6 rather than 2 months.

Table S5 -
Baseline characteristics of the cohort in whom PHQ-9 was measured repeatedly over a one-year phenotyping period.

extended from the STROBE statement, that should be reported in observational studies using routinely collected health data.
Reference: Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, Sørensen HT, von Elm E, Langan SM, the RECORD Working Committee.The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) Statement.PLoS Medicine 2015; in press.
**Checklist is protected under Creative Commons Attribution (CC BY) license.